Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Joker de Bruijn: Covering k-Mers Using Joker Characters

Identifieur interne : 000929 ( Main/Exploration ); précédent : 000928; suivant : 000930

Joker de Bruijn: Covering k-Mers Using Joker Characters

Auteurs : Yaron Orenstein ; Yun William Yu ; Bonnie Berger

Source :

RBID : PMC:6247992

Descripteurs français

English descriptors

Abstract

Abstract

Sequence libraries that cover all k-mers enable universal and unbiased measurements of nucleotide and peptide binding. The shortest sequence to cover all k-mers is a de Bruijn sequence of length \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\vert \Sigma { \vert ^k} + k - 1$$ \end{document}. Researchers would like to increase k to measure interactions at greater detail, but face a challenging problem: the number of k-mers grows exponentially in k, while the space on the experimental device is limited. In this study, we introduce a novel advance to shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet. Theoretically, the use of joker characters can reduce the library size tremendously, but it should be limited as the introduced degeneracy lowers the statistical robustness of measurements. In this work, we consider the problem of generating a minimum-length sequence that covers a given set of k-mers using joker characters. The number and positions of the joker characters are provided as input. We first prove that the problem is NP-hard. We then present the first solution to the problem, which is based on two algorithmic innovations: (1) a greedy heuristic and (2) an integer linear programming (ILP) formulation. We first run the heuristic to find a good feasible solution, and then run an ILP solver to improve it. We ran our algorithm on DNA and amino acid alphabets to cover all k-mers for different values of k and k-mer multiplicity. Results demonstrate that it produces sequences that are very close to the theoretical lower bound.


Url:
DOI: 10.1089/cmb.2018.0032
PubMed: 30117747
PubMed Central: 6247992


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Joker de Bruijn: Covering
<italic>k</italic>
-Mers Using Joker Characters</title>
<author>
<name sortKey="Orenstein, Yaron" sort="Orenstein, Yaron" uniqKey="Orenstein Y" first="Yaron" last="Orenstein">Yaron Orenstein</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Yu, Yun William" sort="Yu, Yun William" uniqKey="Yu Y" first="Yun William" last="Yu">Yun William Yu</name>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Berger, Bonnie" sort="Berger, Bonnie" uniqKey="Berger B" first="Bonnie" last="Berger">Bonnie Berger</name>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">30117747</idno>
<idno type="pmc">6247992</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247992</idno>
<idno type="RBID">PMC:6247992</idno>
<idno type="doi">10.1089/cmb.2018.0032</idno>
<date when="2018">2018</date>
<idno type="wicri:Area/Pmc/Corpus">000D97</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000D97</idno>
<idno type="wicri:Area/Pmc/Curation">000D97</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000D97</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000554</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000554</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:30117747</idno>
<idno type="wicri:Area/PubMed/Corpus">000806</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000806</idno>
<idno type="wicri:Area/PubMed/Curation">000806</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000806</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000873</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000873</idno>
<idno type="wicri:Area/Ncbi/Merge">001F36</idno>
<idno type="wicri:Area/Ncbi/Curation">001F36</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001F36</idno>
<idno type="wicri:doubleKey">1066-5277:2018:Orenstein Y:joker:de:bruijn</idno>
<idno type="wicri:Area/Main/Merge">000932</idno>
<idno type="wicri:Area/Main/Curation">000929</idno>
<idno type="wicri:Area/Main/Exploration">000929</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Joker de Bruijn: Covering
<italic>k</italic>
-Mers Using Joker Characters</title>
<author>
<name sortKey="Orenstein, Yaron" sort="Orenstein, Yaron" uniqKey="Orenstein Y" first="Yaron" last="Orenstein">Yaron Orenstein</name>
<affiliation>
<nlm:aff id="aff1"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Yu, Yun William" sort="Yu, Yun William" uniqKey="Yu Y" first="Yun William" last="Yu">Yun William Yu</name>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Berger, Bonnie" sort="Berger, Bonnie" uniqKey="Berger B" first="Bonnie" last="Berger">Bonnie Berger</name>
<affiliation>
<nlm:aff id="aff2"></nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff3"></nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of Computational Biology</title>
<idno type="ISSN">1066-5277</idno>
<idno type="eISSN">1557-8666</idno>
<imprint>
<date when="2018">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>DNA (analysis)</term>
<term>DNA (genetics)</term>
<term>Gene Library</term>
<term>Genomics (methods)</term>
<term>Humans</term>
<term>Models, Genetic</term>
<term>Programming, Linear</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>ADN (analyse)</term>
<term>ADN (génétique)</term>
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Banque de gènes</term>
<term>Génomique ()</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Modèles génétiques</term>
<term>Programmation linéaire</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="analysis" xml:lang="en">
<term>DNA</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="genetics" xml:lang="en">
<term>DNA</term>
</keywords>
<keywords scheme="MESH" qualifier="analyse" xml:lang="fr">
<term>ADN</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>ADN</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Genomics</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Gene Library</term>
<term>Humans</term>
<term>Models, Genetic</term>
<term>Programming, Linear</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Banque de gènes</term>
<term>Génomique</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Modèles génétiques</term>
<term>Programmation linéaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<p>
<bold>Sequence libraries that cover all
<italic>k</italic>
-mers enable universal and unbiased measurements of nucleotide and peptide binding. The shortest sequence to cover all
<italic>k</italic>
-mers is a de Bruijn sequence of length
<inline-formula>
<tex-math id="eq1" notation="LaTeX">\documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\usepackage{upgreek}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\vert \Sigma { \vert ^k} + k - 1$$ \end{document}</tex-math>
</inline-formula>
. Researchers would like to increase
<italic>k</italic>
to measure interactions at greater detail, but face a challenging problem: the number of
<italic>k</italic>
-mers grows exponentially in
<italic>k</italic>
, while the space on the experimental device is limited. In this study, we introduce a novel advance to shrink
<italic>k</italic>
-mer library sizes by using joker characters, which represent all characters in the alphabet. Theoretically, the use of joker characters can reduce the library size tremendously, but it should be limited as the introduced degeneracy lowers the statistical robustness of measurements. In this work, we consider the problem of generating a minimum-length sequence that covers a given set of
<italic>k</italic>
-mers using joker characters. The number and positions of the joker characters are provided as input. We first prove that the problem is NP-hard. We then present the first solution to the problem, which is based on two algorithmic innovations: (1) a greedy heuristic and (2) an integer linear programming (ILP) formulation. We first run the heuristic to find a good feasible solution, and then run an ILP solver to improve it. We ran our algorithm on DNA and amino acid alphabets to cover all
<italic>k</italic>
-mers for different values of
<italic>k</italic>
and
<italic>k</italic>
-mer multiplicity. Results demonstrate that it produces sequences that are very close to the theoretical lower bound.</bold>
</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Berger, M F" uniqKey="Berger M">M.F. Berger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blanchet Sadri, F" uniqKey="Blanchet Sadri F">F. Blanchet-Sadri</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chen, H Z" uniqKey="Chen H">H.Z. Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="D Addario, M" uniqKey="D Addario M">M. D'Addario</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fordyce, P M" uniqKey="Fordyce P">P.M. Fordyce</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gurard Levin, Z A" uniqKey="Gurard Levin Z">Z.A. Gurard-Levin</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Orenstein, Y" uniqKey="Orenstein Y">Y. Orenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Orenstein, Y" uniqKey="Orenstein Y">Y. Orenstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Philippakis, A A" uniqKey="Philippakis A">A.A. Philippakis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="R Ih, K J" uniqKey="R Ih K">K.-J. Räihä</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ray, D" uniqKey="Ray D">D. Ray</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Smith, R P" uniqKey="Smith R">R.P. Smith</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vassilevska, V" uniqKey="Vassilevska V">V. Vassilevska</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wyatt, B J" uniqKey="Wyatt B">B.J. Wyatt</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations>
<list></list>
<tree>
<noCountry>
<name sortKey="Berger, Bonnie" sort="Berger, Bonnie" uniqKey="Berger B" first="Bonnie" last="Berger">Bonnie Berger</name>
<name sortKey="Orenstein, Yaron" sort="Orenstein, Yaron" uniqKey="Orenstein Y" first="Yaron" last="Orenstein">Yaron Orenstein</name>
<name sortKey="Yu, Yun William" sort="Yu, Yun William" uniqKey="Yu Y" first="Yun William" last="Yu">Yun William Yu</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000929 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000929 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:6247992
   |texte=   Joker de Bruijn: Covering k-Mers Using Joker Characters
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:30117747" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021